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(57) ABSTRACT 

An architecture is provided for sharing text-to-speech (TTS) 
resources. A TTS controller manages the allocation of the 
TTS resources. An application provides a conversion request 
which is provided to a first queue. An available TTS resource 
begins a conversion upon sentence boundaries and converts 
a predetermined minimum amount of text. Once a sufficient 
amount of text is converted, the digitized speech data is 
played to a user. The amount of converted data is monitored 
during the playback operation. As the totality of the con- 
verted data falls below a predetermined minimum the TTS 
controller is notified. If more text remains in a message 
being converted, the TTS controller places a request into a 
second queue. The second queue has a higher priority so that 
continuing conversions are completed before subsequent 
conversions begin. The user is able to cancel this conversion 
operation at any time. By cancelling this conversion 
operation, TTS resources are conserved by not unnecessarily 
converting the whole text message. 

9 Claims, 4 Drawing Sheets 
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Backoround conversion Drocess 


200 Receive convert request (inc. text) 
21 0 Create shared file to store converted 
audio data 

220 Initialize background convert operation 

using shared file 
230 While (dataAvailible) AND (NOT 

CancelledByUser) 

Playback audio from shared file 


240 Insert in InitiatizationRequestQ 
250 Until ((AudioPointer - PlayPolnter) > 
^np^edJnltializationHighThreshold) 

Convert 

260 Until ((AudioPointer - PlayPointer) < 
UnplayedlowThreshold) AND NOT 
DONE) 
Pause 

270 (nsert in RestartRequestQ 
280 Until ((AudioPointer - PlayPointer) > 
Unp^iayedHighThreshold) AND NOT 

Convert 
290 Goto 260 

Note: DONE is true when ail text is 
converted, or when playback Is cancelled by 
the user. This state terminates the 
conversion process. 
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SHARED TEXT-TO-SPEECH RESOURCE the TTS resources. An application provides a conversion 

request which is provided to a first queue. An available TTS 

FIELD OF THE INVENTION resource begins a conversion upon sentence boundaries and 

This invention relates to the field of tcxt-to-speech converts a predetermined minimum amount of text. Once a 

conversion, especially in a voice messaging and communi- 5 sufficient amount of text is converted, the digitized speech 

cations setting. More particularly, this invention relates to a data is played to a user. The amount of converted data is 

method of and apparatus for efficient sharing of a text-to- monitored during the playback operation. As the totality of 

speech conversion resource in a unified messaging applica- the converted data falls below a predetermined minimum the 

tion, TTS controller is notified. If more text remains in a message 

10 being converted, the TTS controller places a request into a 

BACKGROUND OF THE INVENTION second queue. The second queue has a higher priority so that 

Increasing numbers of users are accessing e-mail mes- continuing conversions are completed before subsequent 

sages. At its inception, a user necessarily could only review conversions begin, 

an e-mail message from their desktop, either from a terminal ^ . m 

or personal computer (PC). Modem users require more " BRIEF DESCRIPTION OF THE DRAWINGS 

freedom which prompted remote e-mail access, for example fig. 1 is a block diagram of an embodiment of a unified 

via a laptop computer and modem. More recently, users' messaging system constructed to take advantage of the 

desire for more efficient access to e-mail has prompted the present invention. 

introduction of voice delivered e-mail. In voice delivery, a CTr , • „ ,„ * j* „„ c„„ am u~A- * f*u 

. j * i_ *i ?n MU. 2 is a logic diagram of an embodiment of the present 

machine or human operator reads the e-mail message lK} :„„ A „+: n „ 

mm t c l ii f .„ ™ . r . invention. 

directly from the caller s mailbox. The merging of text and _ T _ , . 

voice messaging into a single delivery source is known in . FIG ; 3A 15 a time hne of a sam P le °P era Hon of the present 

the art as Unified Messaging. This allows the recipients to invention. 

retrieve their e-mail messages at any time they have access FIGS. 3B-3F are detailed diagrams showing specific 

to a telephone. Owing to cellular and satellite telephony 25 ste P s °f tne sample operation shown on the time line in FIG. 

technology, such a system, in essence, allows users to access 3 A. 
their e-mail at any time and from almost any place. 

The machine conversion of an e-mail message to voice 
message utilizes a text-to-speech (TTS) conversion 

resource. Unified Messaging applications in addition to 30 The preferred embodiment of the present invention is for 

other applications which read text over the telephone, use a a shared TTS resource in a Unified Messaging application. 

TTS conversion resource. As is well known in the art, TTS it will be apparent to one of ordinary skill in the art that the 

can be implemented in either host-based software or using principles of the invention can be readily applied to a shared 

separate voice processing hardware. In either form it should TTS resource in other applications (eg. an over-the-phone 

be considered as a 'scarce resource'. ITS is expensive in e-mail reading application.) 

either throughput or hardware expenditures In the host- Referring now to FIG. 1, a block diagram of an embodi- 

based software implementation the CPU cycles associated mem of a unified messa ^ tem 100 constructed to take 

with conversion limit the number of concurrent conversions advaDtage of the present invention is shown. The unified 

which a single system can support Using separate voice 4q m ; tem 100 comprises a xi of telephones 110, 

processing hardware incurs additional cost and consequently m m M {Q a private Branch Exchange (PBX) 120; 

there is a need to operate with a limited number of resources. a computer network 130 comprising a plurality of computers 

Often users do not listen to long recitations of detailed 132 coupled to an e-mail server 134 via a network line 136, 

e-mail messages. Rather, users will listen to a first part of the wriere the e-mail server 134 is additionally coupled to a data 

message then skip the remainder until they return to their PC 45 storage device 138; and a voice gateway server 140 that is 

or laptop computer and review the details of the e-mail coupled to the network line 136, and coupled to the PBX 120 

message in text format. Converting such a message in its v ia a set of telephone lines 142 as well as an integration link 

entirety would in essence be a wasteful use of a scarce 144. The PBX 120 is further coupled to a telephone network 

resource. v i a a collection of trunks 122, 124, 126. The unified mes- 

For at least these reasons, it is desirable to perform TTS 50 saging system 100 shown in FIG. 1 is equivalent to that 

conversions on demand. In other words, the conversion is described in U.S. Pat. No. 5,557,659, entitled "Electronic 

performed when the user is on the telephone and determines Mail System Having Integrated Voice Messages," which is 

that they want to hear their e-mail messages. Unless there incorporated herein by reference. Those skilled in the art 

was a dedicated TTS resource for each user, the likelihood will recognize that the teachings of the present invention are 

exists that a user would be required to wait an extended 55 applicable to essentially any unified or integrated messaging 

period of time for other users to complete the review of their environment. 

e-mail messages so that the TTS resource will be available. ln the presenl invention, conventional software executing 

Under certain circumstances, this delay could prevent the upon the computcr nctwork 130 provides file transfer 

user from retrieving their e-mail messages until a later time. services, group access to software applications, as well as an 

What is needed is a more efficient method and apparatus 60 electronic mail (e-mail) system through which a computer 

for sharing a TTS resource. user can transfer messages as well as message attachments 

What is further needed is an efficient just-in -time sharing between their computers 132 via the e-mail server 134. In an 

of a TTS resource. exemplary embodiment, Microsoft Exchange™ software 

SUMMARY OF THE INVENTION (Microsoft Corporation, Redmond, Wash.) executes upon 

oUMMAKY ut iMtlNVtNiiUN 65 the mmput&T Qelwor k 130 to provide such functionally. 

An architecture is provided for sharing text-to-speech Within the e-mail server 134, an e-mail directory associates 

(TTS) resources. A TTS controller manages the allocation of each computer user's name with a message storage location, 
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or "in-box," and a network address, in a manner that will be 250, conversion of the text data into converted audible data 

readily understood by those skilled in the art. The voice continues until the difference between the audio pointer and 

gateway server 140 facilitates the exchange of messages the play pointer is greater than the Unplayedlnitialisation- 

between the computer network 130 and a telephone system. HighThreshold. If all the text is converted or playback is 

Additionally, the voice gateway server 140 provides voice 5 terminated b me user> then lhis co^^ also terminates, 

messaging service such as call answering, automated ™ . .. ........ 

attendant, voice message store and forward, and message ^ P resent 1Dventlon ^ eues a11 initialization requests m an 

inquiry operations to voice messaging subscribers. In the InitializationRequestQ queue. The initialization requests are 

preferred embodiment, each subscriber is a computer user serviced in the order they are received as a TTS resource 

identified in the e-mail directory, that is, having a computer hecp.ige^a^JLab.le.i^hen the TTS % resource .becomes ayail- 

132 coupled to the network 130. Those skilled in the art will able it is allocated for exclusive use. Any initialization 

recognize that in an alternate embodiment, the voice mes- request that remains in the InitializationRequestQ queue for 

saging subscribers could be a subset of computer users. In longer than a predetermined time MaximumlnitWaitTime is 

yet another alternate embodiment, the computer users could rejected with an < AllResourcesBusy' error and the applica- 

be a subset of a larger pool of voice messaging subscribers, tion ^ SQ not ifi ed 

which might be useful when the voice gateway server is 15 

primarily used for call answering. In tne ste P 260 > the present invention pauses the back- 

ATTS resource according to the present invention ground conversion process until the difference between the 

includes the following characteristics. The output of the audl ° P° inter and the Payback pointer is less than the 

conversion preformed by the TTS resource is digitized audio UnplayedLowThreshold and when either some text is not 

data which conforms to a known format. The digitized audio 20 converted and when playback is not cancelled by the user, 

data can be played to the user, for example via an ordinary When the conversion process is paused, the current position 

telephone handset. An example format is 64 kilobits per in the text pointer is saved. The TTS resource is released and 

second PCM. According to experimentation and data taken returned to the TTS Resource Controller for subsequent 

over a variety of users, at normal reading rates text approxi- reallocation 

mately 100 characters of text takes six seconds to read. Six 25 f , „~ . 

seconds of digitized audio data is approximately 48 kilo- In the ste P 270 > the P resent invention utilizes a Restar- 

bytes of voice data. The preferred TTS resource converts tRequestQ which is for restarting the conversion process 

text to speech at speeds faster than real-time. While the after a pause as described above in the step 260. In the step 

conversion process is CPU intensive, it generally occurs in 280, conversion of the text data into converted audible data 

approximately one tenth of the time it takes to read the text, 30 continues until the difference between the audio pointer and 

depending on system specification and load. the play pointer is greater than the UnplayedHighThreshold 

Callers do not typically listen to the full duration of and when either some text is not converted or when playback 
lengthy e-mail messages. Experience suggests messages are i s no t cancelled by the user. The present invention queues 
often skipped after 60 seconds or so. Thus, for a 'just-in- this restart on a R es tartRequestQ. Next, the process loops 
time' scheme for converting text to audio data only the 35 back to the step 260 where the conversion process is paused, 
initial portions of an e-mail text message should be con- 
verted. The system will only continue with the conversion ^ RestartRequestQ queue is provided a higher priority 
process thereafter if the user continues to listen. In the event than the InitializationRequestQ queue. In this way, once a 
the user hangs up or signals that the remainder of the TTS resource becomes available the present invention will 
message is not presently wanted, the system will not have service the next RestartRequestQ. Any conversions waiting 
wasted resources converting the remainder of the message. 40 in the InitializationRequestQ will be required to wait until 
One way a user can signal to the system to stop TTS all of the requests in the RestartRequestQ are serviced. The 
conversion is for example by pressing an appropriate key on RestartRequestQ conversion is restarted, and continues con- 
the telephone number pad, verting text as before, on sentence boundaries, by sentence, 

Continuing TTS conversion is given a higher priority than and the output again stored in the output storage location, 
conversion of a new message. Preferably, the priority is 45 It is possible that the restart wiU not be serviced (although 
established through the use of two queues. One queue t his ^ un likely if correctly configured) before all the con- 
contains application threads of execution wishing to start a verted data has been played back. In this case the request is 
conversion. The second, higher priority queue contains removed from the RestartRequestQ and an error returned to 
threads wishing to restart. ^ the calling application. 

FIG. 2 shows a sequence chart for illustrating two parallel Conversion is complete when either the caller indicates 
logic sequences of the present invention. The primary play- that he/she does not wish to hear any more converted audio, 
back process is illustrated as steps 200 to 230. The back- or a n text supplied has been converted. If the user cancels 
ground conversion process has an asynchronous nature and t he conversion operation, any in-process conversion opera- 
is illustrated as steps 240 to 290. The present invention 55 tion is canceled, or any queued re-start request is de-queued, 
interfaces with an Application, eg., a Unified Messaging ^ example is prQvided of a system mat incorporales the 
system. teachings of the present invention and is shown in FIGS. 3A 

In operation, a conversion request and incoming text is t0 3p. This example merely shows a specific embodiment of 

received at the step 200. At the step 210, a shared file is the present invention and does not limit the scope of the 

created for storing converted audio data. Next, at the step 60 present invention. It will be apparent to one of ordinary skill 

220, the background conversion process is invoked using the i n tDe art that a system can be provide which supports more 

shared file. This shared file is capable of both storing the or f ew er users and which includes more or fewer TTS 

converted audio data and also simultaneously playing this resources and still follow the spirit and scope of the present 

converted audio data. invention. For the example system conversion happens at 

Next, the present invention utilizes an InilializationRe- 65 ten times the required playback speed. It will be apparent 

queslQ in the step 240 which is an initial step in the that the conversion speed is a function of the processor, the 

asynchronous background conversion process. In the step text data and system usage, among other factors. 
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The example system assumes the following values: 
UnplayedIntitializationHighThreshold«240 kbytes (30 

seconds of audio) 
UnplayedHighThreshold=160 kbytes (20 seconds of 

audio) 

UnplayedLowThreshold=80 kbytes (10 seconds audio) 

FIG. 3 A illustrates a timing diagram which shows a 
sample operation of the present invention. This example 
begins at TO where conversion of the text message to a 
corresponding audio message is initiated. FIG. 3B illustrates 
the initiation of the conversion as described at TO in FIG. 
3A. A text buffer 400 illustrates a storage allocation for text 
data which corresponds to a text message. A text pointer 410 
represents a present location of a pointer device relative to 
the text data within the text buffer 400. Preferably, text data 
located prior to the text pointer 410 (to the left of the text 
pointer 410 in FIG. 3B) has been read by the present 
invention, and text data located subsequent to the text 
pointer 410 (to the right of the text pointer 410 in FIG. 3B) 
has not been read by the present invention. As the text data 
is read from the text buffer 400, the text pointer 410 
advances forward (graphically shown in FIG. 3B as toward 
the right of the audio pointer 410.) 

An audio buffer 420 illustrates a storage allocation for 
audio data which corresponds to converted text data from 
the text buffer 400. The audio data is an audible represen- 
tation of the text data. An audio pointer 430 represents a 
present location of a pointer device relative to the audio data 
within the audio buffer 420. Preferably, the audio data 
located prior to the audio pointer 430 (to the left of the audio 
pointer 430 in FIG. 3B) corresponds to audio data that has 
been written by the present invention and corresponds to the 
text data in the text buffer 400 prior to the text pointer 410. 
Preferably, the audio data located subsequent to the audio 
pointer 430 (to the right of the audio pointer 430 in FIG. 3B) 
corresponds to audio data which has not been written by the 
present invention and does not necessarily correspond to the 
text data in the text buffer 400. As the text data is converted 
from the text data within the text buffer 400 and written as 
audio data into the audio buffer 420, the audio pointer 430 
advances forward (graphically shown in FIG. 3B as toward 
the right of the audio pointer 430.) 

A playback pointer 440 represents a present location of a 
pointer device relative to the audio data within the audio 
buffer 420. Preferably, the audio data located prior to the 
playback pointer 440 (to the left of the playback pointer 440 
in FIG. 3B) corresponds to audio data that has been audibly 
played to the listener by the present invention and corre- 
sponds to an audible representation of the textual data in the 
text buffer 400 prior to the text pointer 410. Preferably, the 
audio data located subsequent to the playback pointer 430 
(to the right of the audio pointer 430 in FIG. 3B) corresponds 
to audio data which has not been played by the present 
invention and may correspond to an audible representation 
of the textual data in the text buffer 400, depending on the 
location of the audio pointer 430 relative to the playback 
pointer 440. As the audio data in the audio buffer 420 is 
audibly played back, the playback pointer 440 advances 
forward (graphically shown in FIG. 3B as toward the right.) 

According to FIG. 3B, at the start of conversion at TO, the 
text pointer 410, the audio pointer 430 and the playback 
pointer 440 are all at their initial start positions. For 
example, the text pointer 410 is preferably located at a far 
leftmost position of the text buffer 400. Additionally, the 
audio pointer 430 and the playback pointer 440 are prefer- 
ably located at a far leftmost position of the audio buffer 420. 

FIG. 3C illustrates the positions of the text pointer 410, 
the audio pointer 430 and the playback pointer 440 at Tl as 
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shown in FIG. 3A. At Tl, conversion of a portion of the text 
data within the text buffer 400 into the corresponding audio 
data within the audio buffer 420 is completed. At Tl, the 
present invention is ready to start audio playback of the 
audio data within the audio buffer 420, As shown in FIG. 3C, 
the text pointer 410 has advanced towards the right within 
the text buffer 400 and indicates where the present invention 
stopped reading the text information within the text buffer 
400. Further, the audio pointer 430 has also advanced 
towards the right within the audio buffer 420 and indicates 
the relative location within the audio buffer 420 where the 
audio data which corresponds to the text data has been 
written. 

FIG. 3D illustrates the positions of the text pointer 410, 
the audio pointer 430 and the playback pointer 440 at T2 as 
shown in FIG. 3 A. At T2, initial playback of the audio data 
within the audio buffer 420 is underway. The text pointer 
410 has moved farther to the right within the text buffer 400 
representing that an additional portion of the text data within 
the text buffer 400 has been read by the present invention. 
Similarly, the audio pointer 430 has also moved farther to 
the right within the audio buffer 420 representing that an 
additional portion of the audio data within the audio buffer 
420 which corresponds to this additional portion of the text 
data being read. Having started playback of the audio data 
within the audio buffer 420, the playback pointer 440 has 
also moved towards the right within the audio buffer 420. 

A threshold level 450 is measured by calculating the 
positional difference between the audio pointer 430 and the 
playback pointer 440. In this case, the threshold level 450 is 
classified as an UnplayedlntitializationHighThreshold. This 
signifies that the present invention currently has converted 
an adequate amount of text data from the text buffer 400 into 
audio data in the audio buffer 420. Preferably because of the 
threshold level 450, both the text pointer 410 and the audio 
pointer 430 are temporarily frozen which restricts the text 
data within the text buffer 420 from additional conversion 
into corresponding audio data. 

FIG. 3E illustrates the positions of the text pointer 410, 
the audio pointer 430 and the playback pointer 440 at T3 as 
shown in FIG. 3A. Similar to the threshold level 450, a 
threshold level 460 is measured by calculating the positional 
difference between the audio pointer 430 and the playback 
pointer 440. In this case, the threshold level 460 is classified 
as an UnplayedLowThreshold. This signifies that the present 
invention currently does not have an adequate amount of 
converted audio data in the audio buffer 420 which corre- 
sponds to the text data within the text buffer 400. Because 
of the threshold level 460, the text pointer 410 preferably 
advances towards the right of the text buffer 400 and read an 
additional portion of the text data. Similarly, the audio 
pointer 430 also advances towards the right of the audio 
buffer 420 and writes an additional portion of the audio data 
to the audio buffer 420. This additional portion of the audio 
data represents this additional portion of the text data. 

FIG. 3F illustrates the positions of the text pointer 410, the 
audio pointer 430 and the playback pointer 440 at T4 as 
shown in FIG. 3 A. At T4, playback of the audio data within 
the audio buffer 420 is underway. The text pointer 410 has 
moved farther to the right within the text buffer 400 relative 
to the text pointer 410 at T3. By moving farther right, the 
text pointer 410 represents that an additional portion of the 
text data within the text buffer 400 has been read by the 
present invention. Similarly, the audio pointer 430 has also 
moved farther to the right within the audio buffer 420 
relative to the audio pointer 430 at T3. By moving farther 
right, the audio pointer 430 represents that an additional 
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portion of the audio data within the audio buffer 420 
corresponds to this additional portion of the text data. 
Having continued playback of the audio data within the 
audio buffer 420, the playback pointer 440 has also moved 
towards the right within the audio buffer 420 relative to the 5 
playback pointer 440 at T3, 

Similar to the threshold levels 450 and 460, a threshold 
level 470 is measured by calculating the positional differ- 
ence between the audio pointer 430 and the playback pointer 
440. In this case, the threshold level 470 is classified as an 
UnplayedHighThreshold. This signifies that the present 10 
invention currently has converted an adequate amount of 
text data from the text buffer 400 into audio data in the audio 
buffer 420. Preferably because of the threshold level 470, 
both the text pointer 410 and the audio pointer 430 are 
temporarily frozen which restricts converting additional text 35 
data from the text buffer 420 into corresponding audio data. 

In this particular example, at T5 as shown in FIG. 3 A, the 
user preferably cancels the playback of the written message. 
Accordingly, conversion of the remaining written message 
into audible data is immediately aborted and the present 20 
invention conserves TTS resources. 

Unlike a conventional multi-tasking approach to resource 
management, the present invention takes into consideration 
that not all users will listen to the entirety of a message. 
Further, because the conversion rate is somewhat faster than 25 
real-time, and the text messages are parsed into grammatical 
units (sentences) the utilization of the system is better than 
a conventional multi-tasking system. The provision of a 
double queue providing higher priority to continuing con- 
version further enhances the efficiency of the system. 
Further, the present invention utilizes a shared storage 30 
device for simultaneously storing converted text data and 
audibly playing this converted text data. 

The present invention has been described in terms of 
specific embodiments incorporating details to facilitate the 
understanding of the principles of construction and opera- 35 
tion of the invention. Such reference herein to specific 
embodiments and details thereof is not intended to limit the 
scope of the claims appended hereto. It will be apparent to 
those skilled in the art that modifications can be made in the 
embodiment chosen for illustration without departing from 40 
the spirit and scope of the invention. Specifically, it will be 
apparent to one of ordinary skill in the art that the device of 
the present invention could be implemented in several 
different ways and the apparatus disclosed above is only 
illustrative of the preferred embodiment of the invention and 45 
is in no way a limitation. 

What is claimed is: 

1. An architecture for managing a plurality of text-to- 
speech (TVS) resources, the TTS resources for converting 
text provided by an application for subsequent presentation 
as audio speech to a user, the architecture comprising: 50 

a. TTS controller coupled to allocate the TTS resources, 
the TTS controller further coupled to receive a new 
conversion request from the application; 

b. a first queue coupled to receive each new conversion 
request from the TTS controller; 55 

c. a shareable storage element coupled to receive and for 
storing a converted message, wherein the shareable 
storage element is coupled for access to both the 
application and the TTS resource; 

d. the TTS controller including means for determining 60 
when a TTS resource becomes available and for 
instructing an available TTS resource to convert the 
text message according to sentence boundaries; and 

e. a second queue coupled to receive a continuing con- 
version request, wherein the continuing conversion 65 
request has a higher priority that the new conversion 
request. 
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2. The architecture according to claim 1 further compris- 
ing means for determining an amount of unplayed converted 
data wherein a conversion operation ceases upon reaching a 
predetermined upper threshold of the amount of unplayed 
converted data. 

3. The architecture according to claim 1 wherein the 
application is a unified messaging system. 

4. The architecture according to claim 2 wherein a con- 
version operation will resume after the amount of unplayed 
converted data falls below a predetermined lower threshold 
of the amount of unplayed converted data. 

5. A TTS controller coupled for managing a plurality of 
text-to-speech (TTS) resources, the ITS resources for con- 
verting text provided by an application for subsequent 
presentation as audio speech to a user, the TTS comprising: 

a. means for determining whether a new conversion is 
required and for providing an indication in a first queue 
in response thereto; 

b. means for determining whether a TTS resource is 
available, and for instructing a resource to initiate a 
conversion upon such a determination; 

c. means for controlling the conversion to continue until 
at least a predetermined amount of text is converted, 
but for continuing until completion of a grammatical 
boundary; 

d. means for stopping the conversion upon determining 
that the predetermined amount of text was converted, 
and for causing the application to playback a converted 
audio message; 

e. means for determining whether a continuing conversion 
is required and for providing an indication to a second 
queue in response thereto, wherein an indication in the 
second queue has a higher priority than an indication in 
the first queue. 

6. The architecture according to claim 5 further compris- 
ing means for determining an amount of unplayed converted 
data wherein a conversion operation ceases upon reaching a 
predetermined upper threshold of the amount of unplayed 
converted data. 

7. The architecture according to claim 5 wherein the 
application is a unified messaging system. 

8. The architecture according to claim 7 wherein a con- 
version operation will resume after the amount of unplayed 
converted data falls below a predetermined lower threshold 
of the amount of unplayed converted data. 

9. A method of managing a plurality of text-to-speech 
(TTS) resources, the TTS resources for converting text 
provided by an application for subsequent presentation as 
audio speech to a user, the TTS comprising: 

a. determining whether a new conversion is required and 
for providing an indication in a first queue in response 
thereto; 

b. determining whether a TTS resource is available, and 
for instructing a resource to initiate a conversion upon 
such a determination; 

c. controlling the conversion to continue until at least a 
predetermined amount of text is converted, but for 
continuing until completion of a grammatical bound- 
ary; 

d. slopping the conversion upon determining that the 
predetermined amount of text was converted, and for 
causing the application to playback a converted audio 
message; 

e. determining whether a continuing conversion is 
required and for providing an indication to a second 
queue in response thereto, wherein an indication in the 
second queue has a higher priority than an indication in 
the first queue. 

***** 
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